Integrating very high resolution environmental proxies in genotype–environment association studies

Abstract Landscape genomic analyses associating genetic variation with environmental variables are powerful tools for studying molecular signatures of species' local adaptation and for detecting candidate genes under selection. The development of landscape genomics over the past decade has been spurred by improvements in resolutions of genomic and environmental datasets, allegedly increasing the power to identify putative genes underlying local adaptation in non‐model organisms. Although these associations have been successfully applied to numerous species across a diverse array of taxa, the spatial scale of environmental predictor variables has been largely overlooked, potentially limiting conclusions to be reached with these methods. To address this knowledge gap, we systematically evaluated performances of genotype–environment association (GEA) models using predictor variables at multiple spatial resolutions. Specifically, we used multivariate redundancy analyses to associate whole‐genome sequence data from the plant Arabis alpina L. collected across four neighboring valleys in the western Swiss Alps, with very high‐resolution topographic variables derived from digital elevation models of grain sizes between 0.5 m and 16 m. These comparisons highlight the sensitivity of landscape genomic models to spatial resolution, where the optimal grain sizes were specific to variable type, terrain characteristics, and study extent. To assist in selecting variables at appropriate spatial resolutions, we demonstrate a practical approach to produce, select, and integrate multiscale variables into GEA models. After generalizing fine‐grained variables to multiple spatial resolutions, a forward selection procedure is applied to retain only the most relevant variables for a particular context. Depending on the spatial resolution, the relevance for topographic variables in GEA studies calls for integrating multiple spatial scales into landscape genomic models. By carefully considering spatial resolutions, candidate genes under selection by a more realistic range of pressures can be detected for downstream analyses, with important applied implications for experimental research and conservation management of natural populations.

such that the independent contributions of each variable group can be separated from confounding effects due to collinearity amongst variable group (Peres-Neto et al., 2006).
Here, the estimated independent effects of i) population structure, ii) spatial geographic structure (geography), and iii) local adaptation (environmental drivers) on observed intragenic SNP variation at each site were investigated, using the varpart function in the vegan R package.
Neutral genetic structure resulting from demographic history was accounted for using PCAs of the LD-pruned intergenic SNP datasets, performed with the rda function in the vegan R package.The number of principal components (PCs) retained to represent population structure was determined visually using scree plots and biplots.Similarly, geography was accounted for using Moran Eigenvector Maps (MEMs), following Dray et al. (2006).Briefly, plant neighbours were triangulated using geographic coordinates (X, Y) to estimate weightings of MEMs using the graph2nb and nb2listw functions of the spdep R package (v.1.2.3;Bivand, 2022).Moran's I was calculated for each MEM eigenvector of the weighting matrix, using 999 permutations using the scores.listwand test.scoresfunctions of the spacemakeR R package (v.0.0-5/r113;Dray, 2013), where only MEMs with a p-value<0.01were retained.The number of MEMs were further reduced using a forward selection procedure, retaining MEMs that best explained variance in the neutral LD-pruned intergenic SNP dataset, with the full RDA model's adjusted-R 2 value as the stopping criteria, which was performed using the forward.selfunction of the adespatial R package (v.0.3-16;Dray et al., 2022).
The contribution of elevation and DEM-derived variables in shaping genetic variation and supporting a pattern of local adaptation was assessed using variables at a range of spatial resolutions.The effect of each of the variable sets (Table 2 in the main Supplementary Material: variation partitioning 2 text) was systematically evaluated for the local and regional analyses.As elevation is known to be correlated with environmental variables such as temperature and humidity (Ashcroft and Gollan, 2013;Hof et al., 2012), it was removed from each of the variable sets and evaluated separately.

Results
Up to half of genomic variation across each local site was explained by either neutral processes, including demographic history (population structure) and spatial geographic structure (geography), or by adaptive processes yielding patterns of local adaptation (Suppl.Fig. S5; Suppl.Table S5), where most of explained variation was confounded between neutral processes (Suppl.Fig. S6).Total explained genetic variation was predominantly influenced by the number of input environmental variables used in analyses, rather than the spatial resolution of the variable set (Table 2).At local sites, population structure, accounted for using PCAs of the neutral intergenic dataset (Suppl.Fig. S7a-d), explained relatively limited variance alone, as it was highly confounded with geography such as latitudinal coordinates (Suppl.Fig. S3).Geography (accounted for using MEMs; Suppl.Fig. S8; Suppl.Table S6) explained more intragenic variance at sites with homogeneous rather than heterogeneous terrains, where geography was stronger when modelled with finer-resolution VS-single models (Suppl.Fig. S5).After accounting for neutral spatial genetic structuring, elevation explained very little genetic variation on its own (generally <1%) at the local sites.Likewise, the environmental partition explained little genetic variation alone, where patterns of local adaptation were stronger at homogeneous than at heterogeneous terrain sites (Suppl.

Fig. S5).
At the regional level, almost half of the total intragenic variation was explained by neutral or adaptive processes, half of which was confounded (Suppl.Fig. S3e; Suppl.Fig. S5).In contrast to the local sites, more than half of this unconfounded variance was shaped purely by geography and, to a much lesser extent, also population structure (Suppl.Fig. S7e).Patterns of local adaptation however remained weak at the regional level, with elevation accounting for <1% of explained genetic variation, and the remaining environmental variables accounting for <3% with VS-single and <15% with VS-fwd and VS-all.